Psychometric performance of the Arabic versions of the Functional Assessment of Cancer Therapy-Breast plus Arm morbidity (FACT-B + 4) in patients with breast cancer related lymphedema: cross-sectional study

Background Burden of breast cancer it continues to increase largely because of the aging and growth of the world population and assessment of quality of life is an important outcome measure to facilitate and improved care among breast cancer survivors, the aim of this study was to evaluate evidence of reliability, validity, and responsiveness of the Arabic version of the FACT-B + 4 questionnaire among participants with breast cancer related lymphedema (BCRL) in Saudi Arabia. Methods A prospective cross-sectional study, 51 participants with BCRL completed the Arabic version of FACT-B + 4. Internal consistency and test–retest-reliability were assessed using Cronbach’s alpha, intraclass correlation-coefficient (ICC), and limits of agreement according to the Bland Altman method, respectively. The validation studies were carried-out by examining predefined hypotheses (n = 14) for both construct and Known-groups validity. To investigate the responsiveness, the Arabic version of FACT-B + 4 questionnaire was administrated preoperative and 4 weeks postoperatively among the participants with breast cancer (n = 34). Results The Cronbach alpha of the Arabic FACT-B + 4 total score was 0.90 and for the different subscales ranged from 0.74 to 0.89. Test–retest reliability for FACT-B + 4 total score and different subscales was found to be moderate to very strong (ICC 0.51–0.94). The Bland–Altman plot was adequate − 19.24 and 22.10 points. Measurement variability was acceptable for Arabic FACT-B + 4 and ARM subscale (standard error of measurement = 5.34, and 1.34). Moderate correlations (r = 0.42–0.62) were found between the subscale of the FACTB + 4 and the corresponding domains of SF-36. For known group validity, 72% (10 of 14) hypotheses on known group validity were accepted. Conclusion FACT-B + 4 has adequate psychometric properties, thus making it useful for assessing QOL quality of life in Arabic speaking women with BCRL.

surgeries and radiotherapy, have associated with an increased rate of long-term survival [3]. However, these interventions can cause damaging to the lymph nodes and/or vessels, leading to accumulation of proteinrich fluid, and development of breast cancer-related lymphedema (BCRL) [3][4][5].
The prevalence of BCRL varies in the literature and depends on treatment regimens [6], definition, methods of clinical detections and measurements techniques [7]. Recent review reported that one in five women who survive breast cancer would develop arm lymphedema [8]. In Saudi Arabia, BCRL has been a rising condition over the last 10 years, with an estimated incidence of 14.5% showing significance of this condition [9].
The most obvious symptom of BCRL is swelling of the affected limb, heaviness, tightness, and stiffness [10]. In addition, lymphedema can lead to physical impairment such as decrease strength, limited range of motion and fatigue, which can cause activity limitations and functional impairment in the affected arm, and negative body image, depression, and anxiety, all of which might be negatively influenced quality of life (QoL) [11][12][13]. Therefore, QoL is an important outcome measure to facilitate improved care of those with BCRL.
Several self-reported questionnaires used to detect the influence of BCRL on physical, functional, and social aspects of life among patients with BCRL [14][15][16][17], such as short form (SF-36) [14] and Disabilities of Arm, Shoulder, and Hand (DASH) [15], the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core (EORTC QLQ-C30) [17]. However, these instruments are generic, cancer specific, nonspecific, and not sensitive enough to detect the impact of BCRL on QoL. Therefore, using specific instruments is more likely to track changes in QoL [16].
One of the most used QoL questionnaires designed for patients with BCRL is the Functional Assessment of Cancer Therapy-Breast plus Arm morbidity (FACT-B + 4). The scale contains different subscales that address physical, social/family, emotional and functional wellbeing, with additional concerns related to breast cancer and an arm subscale [18]. FACT-B + 4 is a simple, short, selfadministered questionnaire that was originally drafted and validated in the English language by Brady et al. in the USA, then the arm subscale (ARM) was developed and incorporated into the existing FACT-B by Coster et al. [18,19]. Recently, the FACT-B + 4 questionnaire has been translated, culturally adapted, and validated to be used in different language and social environment [20][21][22].
Using a reliable and valid instrument to measure QoL, considering language and cultural differences, is crucial. To our knowledge, an Arabic version of the FACT-B + 4 is available, but evidence of reliability and validity of the Arabic version of the FACTB + 4 has not been established, and the psychometric properties of the FACTB + 4 also need to be estimated. Therefore, the aim of this study was to evaluate evidence of reliability, validity, and responsiveness of the Arabic version of the FACT-B + 4 questionnaire among participants with BCRL in Saudi Arabia.

Study design
A prospective cross-sectional study was carried out at oncology and physical therapy departments, King Saud Medical City, King Fahd Medical City (KFMC) and King Faisal specialized Hospital, Riyadh. Saudi Arabia. The institutional review boards and Ethics Committee of the entry hospitals approved this study.

Participants
The study was conducted with 2 samples each participant signed informed consent form to participate and publish: (1) women with unilateral BCRL (inter-limb difference > 2 cm increase of any circumference), and (2) Saudi women with breast cancer survivors who were scheduled for surgery. Inclusion criteria were as following; age > 18 years, native Arabic speaking. Exclusion criteria were pregnant, active malignancy, and current infection/ open wound, ongoing chemotherapy, and radiotherapy, local and/or systemic diseases causing impairment/disability in the affected limb, and cognitive impairment.

Data collection procedures
The primary researcher who was Certified lymphedema therapist interviewed the eligible participants to gather sociodemographic and clinical data including age, educational level, and occupation, type of breast surgery, hand dominance, site, and duration of lymphedema, then checking the medical record to ensure the validity data. The BC participants were evaluated before surgery and after 1 month, while participants with BCRL were evaluated on two separate interviews (with 7 days interval). Details of the study procedures were given through verbal and written information, then each participant signed informed consent form. After that, participants completed independently battery of self-reported questionnaire included (1) Arabic version of FACT-B + 4 questionnaire and (2) SF-36.
Prior to administration of the FACT-B + 4 Arabic version, the instrument was piloted tested on twenty BCRL participants who met the inclusion criteria of the study. The participants completed the instrument and then during a face-to-face interview were asked to assess their understandability, the clarity of the scoring system, and completeness of the questionnaire [20 = 32]. The results indicated that the instructions and the structure of the FACT-B + 4 Arabic version instrument were understandable, clear, and easy; no one asks for clarification or explanation to help them respond to any of the items. However, forty percentage of the participants found item B4 ("I feel sexually attractive") to be inappropriate and suggested that it be replaced with "I feel attractive (pretty)".

Functional Assessment of Cancer Therapy-Breast plus Arm morbidity (FACT-B + 4)
The FACT-B + 4 is self-reported questionnaire designed to measure health related quality of life in participants diagnosed with BCRL and composed of 40 items divided into three parts: the general subscale on cancer (FACT-G), breast specific subscale (BCS) and ARM subscale [17]. The FACT-G divided into 4 subscales that measure physical well-being, social/ family well-being, emotional well-being, and functional well-being. The BCS has 10 items, while the ARM subscale has four items [18]. The Response system is 5 Likert scores varying between "0" shows "No at all" to "4" indicates 'Very much", where the positively stated items directly got scores from 0 to 4 points, and the negatively stated items are reversed [18,19]. Permission to use the FACT-B + was obtained from the FACT organization (owned and copyrighted by David Cella).

Health related quality of life assessment using 36-Item Short-Form Health Survey (SF-36)
The SF-36 is a self-report generic measure of health status. The SF-36 is culturally adapted into Arabic and has a good reliability and validity within breast cancer survivors [23][24][25]. It comprises 36 items that are combined to form two main domains: physical and mental health status. Each subscale had score ranging from "0 to 100" scores where "0' indicts worst health status and "100" represents optimal health conditions. The total score for each subscale computed according to the RAND 36 Health Survey manual and interpretation guide [26].

Statistical analysis
The analyses were performed using IBM SPSS Statistics Version 26. Significance was set at P < 0.05. Descriptive statistics were computed for sociodemographic data and clinical characteristics.
Wilcoxon test was conducted to examine systematic differences between the two interviews. Bland Altman plots were used to assess the extent of agreement between the two measures. Assuming that the differences follow normal distribution, the limits of agreement (LOAs) lie within d ± 1.96 × SD, where d represents the mean difference between the two measurements, and SD is the standard deviation of differences of each pair [30].
The standard error of measurement (SEM) was calculated from the following formulas: SEM = SD12√(1 − ICC) to evaluate the magnitude of the within-subjects variation, then the minimal detectable change at the 95% (MDC95%) was calculated based on SEM according to the following formulae MDC95 = 1.96 × SEM × √2 [27][28][29].
Potential ceiling and floor effects were measured by calculating the percentage of participants achieving the minimum or maximum scores. Ceiling and floor effects are considered being present if > 15% of the participants achieved the lowest and highest possible total score [31].

Validity
Construct validity was evaluated in 2 ways. First, the correlation of FACT-B + 4 subscales (e.g. functional wellbeing, physical well-being, emotional well-being, and social/family well-being), with the corresponding domains of SF-36 including physical functioning, rolephysical, role-emotional, and social functioning, respectively. Pearson correlation coefficient and the Spearman correlation coefficient were used for normally distributed scores and for the other scores, respectively. Second, the known-groups validity is used to discriminant between two groups that are already known to differ in terms of the variables of interest. In this study, known groups validity was examined by comparing scores of FACTB + 4 between the BC participants and BCRL Women were categorized with BCRL. The Mann-Whitney U test was used to compare the overall FACT B + 4 total and subscales scores between both groups. We formulated 14 hypotheses (Table 1) for both construct and knowngroups validity based on literature [31]. Hypotheses were accepted when scoring a correlation coefficient 0.40. Construct validity was defined as very good if more than 90% of all 14 hypotheses were confirmed, good if 75% to 90% of the hypotheses were confirmed, and moderate if between 40 and 74% of the hypotheses were confirmed [29].
Responsiveness was evaluated to examine the sensitivity of the scale to change over time in the BC group (n = 34) by comparing mean change scores of each subscale and total scores at 4 weeks postoperative minus baseline (before surgery) using Wilcoxon signed-ranks test. Based on Coster et al., recommendation, we postulated that within 4 weeks after surgery BC participants would have poor quality of life and higher levels of Arm morbidity than before surgery [18].

Sample size estimation
G power software version 3.1.9.4 (University of Düsseldorf, Düsseldorf, Germany) was used for sample size calculation for the reliability analysis with the following parameters Alpha = 0.05, Power = 80%, and (r = 0.40). A minimum sample size of 46 is sufficient to detect a value of 0.40 for the ICC. An additional 10% drop-out rate was set thus sample size was increased to 51 participants [33]. While a sample size of 30 was considered the minimum required sample for examining test-retest reliability [32].

Participants
Fifty-one participants with BCRL, and 34 BC participants without lymphedema involved in the present study. The two groups were comparable in age, educational level, body mass index, and types of breast surgery. All participants' characteristics are presented in Table 2. Table 3 shows the Cronbach alpha coefficients, ICCs, SEMs, and MDC95% for the Arabic FACT-B + 4 score, and each subscale scores in participants with BCRL.

Reliability/floor and ceiling
The Cronbach α of the Arabic FACT-B + 4 total score was 0.90 and for the different subscales ranged from 0.74 (social/family well-being) to 0.89 (FACT-BC). For test-retest reliability; 34 BCRL participants completed the questionnaire for two interviews (baseline and within 7 days). The test-retest reliability of the  The total Arabic FACT-B + 4 score had a variability (SEM) of 5.34, between the two measurements. Furthermore, the MDC 95 for the Arabic FACTB + 4 questionnaire total score was 14.80 points, with a decrease of the FACT-B + 4 score of 14 or more and an increase of 17 or more could be considered a clinically relevant change. The ARM subscale score had a variability (SEM) of 1.34, between the two measurements. Furthermore, The MDC 95 was 3.72, with a decrease of the ARM score of 4 or more and an increase of 4 or more could be considered a clinically relevant change.
As shown in the Table 3, the Wilcoxon signed-rank test for total score of the Arabic FACTB + 4 and each subscale score did not show any significant differences (P < 0.05) between both test occasions. Figure 1 shows the Bland-Altman graph of the FACT-B + 4 questionnaire. The value of mean differences was 1.4314 (SD, ± 10.54.), and the limits of agreement for the total scores were -19.24 and 22.10 points This indicated that, questionnaire had a good random distribution around zero, with few points out of range.
The floor/ceiling effects revealed that the percentages of participants scoring at the floor/ceiling level for total scores of the Arabic FACT-B + and subscales were less than 15%; however, we found a ceiling effect of 19.6% and 15.7% for the SWB and EWB, respectively. Table 4 presented the associations between most similarities' subscale of the FACTB + 4 and the corresponding domains of SF-36. The analysis of associations between PWB, SWB and EWB, FWB dimensions of the Arabic FACTB + 4 and the corresponding dimensions on SF-36; role limitation due to physical function, social functioning, role limitation due to emotional problems and physical function dimension revealed moderate correlation (r = 0.42-0.62), therefore, all 4 hypotheses were accepted. Table 5 represents known group validity of the Arabic FACTB + 4. The total score on Arabic FACTB + 4 and the scores FACT-B, FACT-TOI, ARM, the BC subscales were significantly higher (P < 0.001) for BC participants without lymphedema than for those with BCRL, Participant with and without lymphedema had a comparable score on the PWB, EWB, FWB, and FACT-G. Analysis of Aram subscale items revealed that BCRL participants significantly had more arm problems than BC without lymphedema, including swelling (96.08% vs 12.19; P = 0.001), pain (74.51% vs 24.39%; P = 0.01), arm movement (66.67% vs 17.07%; P = 0.01), numbness (66.67% vs 19.51%; P = 0.01), and stiffness (51% vs 4.87; P = 0.01), therefore, 6 out of 10 hypotheses were accepted. Construct validity of the Arabic FACTB + 4 was moderate, as 72% (10 of 14) of the hypotheses were established. Table 6 represents the responsiveness to change of FACT-B + 4 scores in BC participants (n = 21) before and after surgery. A significant declined (P < 0.01) was reported in total FACTB + 4 score and in all subscales (P < 0.05) except for EWB and BC.

Discussion
The lack of validated self-reported Arabic outcome measures in patients with breast cancer related lymphedema has restricted research. Therefore, the aim of this study was to evaluate evidence of psychometric performance of the Arabic version of the FACT-B + 4 questionnaire among participants with BCRL in Saudi Arabia.
Piloted tested showed clarity, acceptability, and understandability of the Arabic FACTB + 4 version with no difficulty regarding scale's instructions and scoring system. There are no floor and ceiling issues reported during using the Arabic FACT-B + 4. These findings supporting the content, relevance, and comprehensibility of Arabic FACTB + 4 version with the original version developed by Coster et al. [18] and show its suitable as specific outcome measures to be used in assessments of Arabian women with a BCRL both in research and in clinical settings.
The ICCs of the Arabic version of FACT-B + 4 total scores and the scores on each subscale varied between moderate and very strong (ICC = 0.72-0.94). These results are in line with the early finding of Coster et al. [18], who reported significant ICC value for an English version of FACT-B + 4 (ICC = 0.97) and the Brazilian version 0.86 (ICC = 0.86, 95% CI 00.80 to 0.90) in a group of BCRL (n = 18) [22,31]. The ICCs of the Arm subscale was very strong (ICC = 0.94, 95% CI 0.89-0.97) and somewhat lower than, but comparable with that reported by English version (ICC = 0.97, 95% CI 0.79-0.95) [18]. However, Brazilin [31] and Spanish [20] versions reported lower ICC for Arm subscale (0.75-0.88), respectively. The reliability estimate of the Arabic BC  The average values of FACTB + 4 total score and each subscale in the test and retest showed that participant with BCRL had similar scores in the Arabic FACT-B + 4 with different administrations time.
The Arabic FACT-B + 4-measurement error was quantified in the current study using SEM and MDC. The reported MDC indicates that the score in the Arabic FACT-B + 4 needs to change by at least 14.80 points for total scales and 1.34 points for Arm subscales, in order to describe that change in the score as a true change in the participants with BCRL. However, none of these studies examined minimal detectable changes, as recommended by Lexell and Downham [28]. The measurement error by SEM for the Arabic of version FACT-B + 4 total score, and each subscale had somewhat smaller SEMs in comparison to both original English questionnaire [18] and PWB, EWB, FWB, BC, and Arm subscales and FACT-B + 4 total score for Brazilian version [31].  There are several differences in conditions for examining reliability which could influence the test results. For test-retest analyses, various time intervals were selected between test-retest periods. In the current studies, the time interval for test-retest analyses was 1 week based on the original English version [18], and Spanish version [20], while in the Italian version, it was 2 days among participant underwent breast cancer surgery [21] and 30 days for the Brazilian version [31]. All patients completed the Brazilian version under supervision at both test and retest, whereas the Arabic questionnaires was completed without supervision since the FACT-B + 4 is self-administered. In addition, the variability in samples used such as participant with recent breast surgery in Italian version [21], mixed group of BCs with and without lymphedema [31]. These reflected non-homogeneity in response to scale rather than equivalence problem of the Arabic version.
Construct validity was tested in 2 ways and gave good results in the patients with BCRL. The PWB, SWB, EWB and PWB scores of the Arabic FACT-B + 4, had moderate correlation (between 0.42 and 0.62), with the expected domains of the SF-36. Other studies found comparable [20] or slightly lower correlations [31], between their questionnaire and a questionnaire already tested on validity. The FACT-B + 4 showed correlations between 0.31 and 0.41 with similar domains in the SF-36. On the other hand, World Health Organization Quality of Life-brief (WHOQOL-brief ), had good correlation with FACTB + 4; physical health domain of the WHO-QOL-bref and PWB scale (r = 0.69), social relationships domain of the WHOQOL-bref and SWB scale of (r = 0.62); psychological domain of the WHOQOL-bref and EWBT (r = 0.61) [31]. Furthermore, these results, comparable with Martınez et al., reported a significant correlation of the SF-36 questionnaire with FACT-B + 4 except for vitality and social function [20]. However, in a study by Coster et al., this analysis was not performed [18].
Concerning the second method of the construct validity analyses, participants with BCRL had a significantly lower total score on the Arabic version FACT-B + 4, lower scores on the other subscales, including ARM subscale. These results indicated that the arm subscale, the breast cancer concerns scale, and the FACT-B + 4 were able to discriminate between participants with arm morbidity and those without arm problems. This finding confirmed the results of Coster et al. [18] who found significant differences in mean scores between participants suffering from lymphedema and without lymphedema.
This study found that breast cancer patients reported a significant decline QOL in terms of the physical, functional, social function and ARM subscales as well as FACT-G, FACT-TOI, FACT B and FACTB + 4 total FACT-B + at 4 weeks after surgery in comparison to preoperative status. These findings were similar to the result from in Coster et al. 's [18] study, which suggests that patients will suffer from arm morbidity more than 1 month after surgery. These findings suggested that the arm subscale and the Arabic FACT-B + 4 were sensitive to changes in arm morbidity during the postoperative period.
A strength of the current study was that different aspects of reliability, validity and responsiveness of the FACT-B + 4 were investigated and followed the recommended methods and preferred statistical analyses outlined by the COSMIN group [34]. Our study had some limitations. The inclusion criteria included the largest possible number of women with breast cancer-related lymphedema, regardless of their severity and stage. The wide variety in the type of surgery and time since surgery may have become a limitation because a more homogeneous sample regarding lymphedema stage and severity or surgery types couldn't have similar changes in QoL. Among the limitations of this study, the sample of patients with BCRL comes from a central province of Saudi Arabia, which limits the generalizability of data because of cultural differences between Saudi Arabia's provinces. However, we believe this will have a minimal effect on the generalizability of the results, because the FACT-B + 4 using Modern Standard Arabic, the language used in books, newspapers, magazines, media, formal speech, and communications and the most common form of Arabic taught in primary education [35]. Despite achievement of adequate acceptance of validity hypothesis (10 of the 14; 72%). Future study should emphasis to improve the remaining items that could not be shown to be effective.

Conclusion
The Arabic FACT-B + 4 version showed strong internal consistency, test-retest reliability, and moderate construct validity similar to the original questionnaire. These results may enable the Arabic FACT-B + 4 version to be used to assess quality of life in Arabic speaking women with BCRL.